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these  two  realities  and  derives  the  asymptotic  distribution  of  the  induced 
maximum  likelihood  estimator. 


This  article  assumes  that  items,  while  sampled  from  an  infinite  set  of  items 
have  but  a  finite  domain  of  alternate  response  functions:  this  situation  is  the 
case  of  the  finite-generic-item-pool.  Later  articles  will  attempt  to  remove 
this  assumption. 

Using  the  proposed  sample  space,  the  article  applies  the  statistical 
functional  approach  of  von  Mises  to  derive  the  influence  curve  of  the  maximum 
likelihood  estimator;  to  discuss  related  robustness  properties;  and  to  derive 
new  classes  of  resistent  estimators.  This  article's  general  purpose  is  reveal¬ 
ing  the  value  of  these  methods  for  uncovering  the  relative  merits  of  different 
item  response  functions.  Proofs  and  mathematical  derivations  are  minimized  to 
increase  the  assessability  of  this  complex  subject. 


Asymptotic  properties  of  induced  maximum  likelihood  estimators 
of  non-linear  models  for  item  response  variables: 
the  finite-generic-item-pool  case 


Douglas  H.  Jones 

Advanced  Statistical  Technologies  Corporation 


Abstract 

The  progress  of  modern  mental  test  theory  depends  very  much  on  the  tech¬ 
niques  of  maximum  likelihood  estimation,  and  many  popular  applications  make  use 
of  likelihoods  induced  by  logistic  item  response  models.  While,  in  reality, 
item  responses  are  nonreplicate  within  a  single  examinee  and  the  logistic  models 
are  only  ideal,  practitioners  make  inferences  using  the  asymptotic  distribution 
of  the  maximum  likelihood  estimator  derived  as  if  item  responses  were  replicated 
and  satisfied  their  ideal  model.  This  article  proposes  a  sample  space 
acknowledging  these  two  realities  and  derives  the  asymptotic  distribution  of  the 
induced  maximum  likelihood  estimator. 

This  article  assumes  that  items,  while  sampled  from  an  infinite  set  of 
items,  have  but  a  finite  domain  of  alternate  response  functions:  this  situation 
is  the  case  of  the  finite-generic-item-pool.  Later  articles  will  attempt  to 
remove  this  assumption. 

Using  the  proposed  sample  space,  the  article  applies  the  statistic.  I  func¬ 
tional  approach  of  von  Mises  to  derive  the  influence  curve  of  the  maximum  like¬ 
lihood  estimator;  to  discuss  related  robustness  porperties;  and  to  derive  new 
classes  of  resistent  estimators.  This  article's  general  purpose  is  revealing 
the  value  of  these  methods  for  uncovering  the  relative  merits  of  different  item 
response  functions.  Proofs  and  mathematical  derivations  are  minimized  to 
increase  the  accessability  of  this  complex  subject. 

v  - - 


1 .  INTRODUCTION 


While  maximum  likelihood  procedures  are  popular  in  item  response  theory  j 

(IRT) ,  (Lord,  1980),  their  insensitivity  to  departures  from  assumptions  is  ' 

serious  enough  to  warrant  cautious  use  and  further  study  (Wainer-Wright ,  1980; 

Jones,  1982).  The  purpose  of  this  article  is  to  explore  the  behavior  of  the 
procedures  when  the  model  is  not  true. 

To  apply  some  of  the  concepts  of  robustness  theory,  we  found  that  some  of 
the  more  important  concepts  required  reformulating  the  maximum  likelihood  pro¬ 
cedures.  In  particular  the  study  of  the  robustness  of  the  maximum  likelihood  , 

j 

estimator  (MLE)  requires  viewing  it  as  a  function  of  the  empirical  probability  ! 

i 

distribution  function  (PDF).  The  original  formulation  of  item  response  theory,  < 

as  a  regression  problem,  does  not  allow  the  summarization  of  the  data  in  terms  \ 

of  an  empirical  PDF.  In  §2,  we  recast  the  structure  of  the  problem  so  that  the 
data  can  be  replaced  by  an  empirical  PDF  and  we  reformulate  the  MLE  as  a  func¬ 
tion  of  it. 

In  §3,  we  derive  the  asymptotic  distribution  of  the  MLE  when  the  true  PDF  J 

! 

is  not  generated  by  the  assumed  model.  These  results  are  basic  to  understanding 
the  sensitivity  of  the  MLE  to  departures  from  assumptions.  They  make  heavy  use 

I 

of  von  Mises's  approach  to  statistical  functions  (Fillippova,  1982).  j 

V 

In  §4,  we  apply  the  asymptotic  formulas  derived  in  §3  to  three  popular  item 
response  models.  A  measure  of  goodness-of-f it ,  compatable  with  the  MLE,  is  ^ 

employed  to  play  the  role  of  the  mean  squared  error  (McCullagh-Nelder ,  1983; 

Pregibon,  1981).  These  results  reveal  that  certain  item  response  models  reverse  ' 

the  scale  of  ability.  ? 


I 

i 


In  §5,  we  formulate  the  basic  robustness  criteria  associated  with  Hampel's 
influence  curve  (IC)  (Hampel,  1974;  Welsch-Krasker ,  1982).  We  derive  a  relation 
between  the  IC  and  the  maximum  bias  of  the  MLE  as  the  true  PDF  is  varied  within 
an  e-contamination  neighborhood  of  the  modeled  PDF  (Huber,  1981).  We  also 
derive  the  breakdown  point  (Huber,  1981)  of  the  MLE  for  certain  types  of  depar¬ 
tures  from  the  assumptions.  The  analysis  of  these  criteria  shows  how  the  notion 
of  robustness  in  IRT  is  fundamentally  different  from  linear  and  logistic 
regression  problems. 

2.  GENERAL  NOTATION  AND  STRUCTURE 

The  basic  formulation  of  IRT  based  on  maximum  likelihood  is:  u=l  (correct) 
or  u=0  (incorrect)  is  observed  for  each  item  i  with  likelihood,  given  a  real 
latent  parameter  0,  equal  to 

hi(u;6)  =  Pi(8)u  [1-Pi(0)]1~u 

and  with  Pj(0)  the  if^  item  response  model.  The  total  likelihood  based  on  data 
Uj ,U2, . . . ,un  and  models  P^,P2,...,Pn  is: 

n 

L(0:u1( .  .  .  ,un,  PX  ,  •  •  •  »Pn)  =  II  hi(ui;8). 

i=l 

For  robustness  studies,  we  need  to  allow  for  the  possibility  that  the  item 
response  models  are  inaccurate.  Thus,  we  assume  that  E(u.[)*Pi(0) .  But  we 
retain  the  assumption  of  local  independence  and  call  P^(8)  an  operational  model . 

To  accommodate  items  with  different  difficulties  and  discriminating  powers, 
and  simultaneously,  apply  standard  asymptotic  theory,  we  formulate  the  sample 
space  as : 

(Sample  Space)  S  =  {(u,x):  u=0,l;  xtX} 

X  =  finite  set  indexing  items. 


An  observation  on  S  is  denoted  by  s  or  t ,  etc.,  and  is  generated  by  administer¬ 
ing  a  randomly  chosen  item,  x,  to  obtain  a  response,  u. 


An  arbitrary  probability  distribution  function  (PDF)  on  S  is  denoted  by  n. 
A  probability  distribution  over  X  is  denoted  by  p.  The  conditional  probability 
distribution  of  u  given  x  is  denoted  by  f(u;x).  For  arbitrary  n ,  there  is  a  p 


and  f(u;x)  such  that: 

n(s)  =  f (u;x)  p(x),  s  =  (u,x). 

Because  u  is  binary;  f(u;x)  is  Bernoulli  with  some  probability  of  success,  II  (x) 
satisfying: 

f(u;x)  =  n*(x)u  [i-n*(x)]1-u. 

The  empirical  PDF  defined  for  a  sample  Si,S2,...,sn  is  defined  by  denoting 
6S  to  be  a  point  mass  at  s  and 


T]n(t)  =  n-1  l  SsjC1)- 
i=l 

It  is  a  PDF  on  S.  The  distance  between  two  PDF's  £  and  n  is  defined  as  |£~n|  = 

max|£(s)-i)(s) | . 
seS 

A  parametric  family  of  PDF's  on  S  is  defined  by  { n 0 : 8  real}.  Values  of  n q 
are  denoted  by  t)(s;0).  A  special  type  of  a  parametric  family  is  generated  by  a 
set  of  operational  models: 

Operational  Models:  { II ( 8 ; x)  :xeX;0  real} 

Parametric  Family:  f  (u; 0  ,x)=JI(0  ;x)u{  1-11(0  ;x) } 1-u 

n(s;0)=f(u;0,x)p(x) . 

The  traditional  structure  of  IRT  is  related  as  follows:  the  i**1  obser¬ 
vation  is  Ui  with  model  Pi(0).  Let  x^  be  the  index  value  of  the  it^1  chosen  item 
where  II ( 0 ; x^ ) =Pi ( 8 ) -  Let  Si=(ui,Xi),  so  that  n(si ; 0)=f (u^ ; 0 ,Xi)p(xi)=hi (ui ; 0) 
p(xi) . 

The  likelihood  based  on  the  sample  s^,S2>--->sn  is: 


If  {p(x^) :  i=l,...,n}  contains  no  information  about  0,  MLE';»  based  on  the  two 
likelihoods  are  identical. 

The  log-derivative  of  the  parametric  PDF  is  denoted  by  £(s;0)  =  (d/d8)  log 
n(s;0).  If  it  exists,  a  solution  of  the  implicit  equation 

n 

(Normal  Equation)  0  =  £  £(s^;0) 

i=l 

is  denoted  by  0n  and  is  called  the  MLE.  This  equation  simplifies  with  opera¬ 
tional  models  as  follows:  The  logit  of  an  operational  model  11(0 ;x)  and  its 
derivative  is 

g(0;x)  =  log  JI(0  ;x)/[l-II(0  ;x)  ] 
g'(0;x)  =  v(0;x)-1  Il'(8;x)  ,  where 
v(0;x)  =  n(0;x)  [l-ll(0;x)]. 

Using  the  definition  of  T|  g ,  we  have: 

£(s;0)  =  g'(8;x)[u-JI(0;x)] 
and  the  normal  equation  becomes 

n  n 

0  =  £  «*(0;xi)  [ui-Il(8;xi)]  =  £  v(8;xi)~1  [ui-IKOjXi)]  II '  ( 6  ;xA) 

i=l  i=l 

The  Fisher  information  of  the  parametric  PDF,  n 0 >  is 

1(0)  =  -l  £'(s;0)  n(s;0)  =  £  £(s;0)*  n(s;0) 

where  the  sum  is  over  all  s  in  S.  This  information  identity  follows  from  the 

total  differential  of  0  =  J  £(s;0)  n(s;0);  using  n ' ( s ; 0 )  =  n(s ; 0) (d/d8) log  n(s;0) 

=  n(s;0)  £(s;0)  we  have, 

0=1  £'(s;8)n(s;0)  +  ££(s ; 0)q ' (s ;0) 

=  l  £'(s;0)n(s;0)  +  ££(s  ;0)  *i)(s  ;0)  . 

Note  for  computational  purposes:  1(0)  =  £  g'(0;x)?  v(8;x)  p(x) . 


item  response  models  are  characterized  by  their  logits: 

g(8;x)  =  a(x)[8-b(x)]  where  a(x)  >  0,  -«  <  b(x)  <  ••  are  the 

discrimination  and  difficulty  parameters  for  item  x.  Hence, 

n 

£(s;8)  =  a(x)  [u-Il(8  ;x)  ]  and  0  =  5!  a(xj.)[uj.  “  11(8^  ;x^)  ]  is  the  normal 

i=l 

equation.  The  Fisher  information  is  1(8)  =  £  a(x)*  v(8;x)p(x),  sum 
over  all  X. 

We  wish  to  generalize  the  normal  equation  in  two  ways:  first,  we  want  to 
show  the  explicit  relation  between  8n  and  nn;  second,  we  wish  to  consider  esti¬ 
mators  that  are  more  general  than  MLE's. 

We  rewrite  the  normal  equation  using  the  empirical  PDF  as 

0  =  l  £(s;  8)  nn(s) 

where  it  will  be  understood  that  the  sum  is  always  over  S.  We  see  that  the  MLE 
depends  explicitly  on  the  empirical  PDF,  we  denote  this  dependence  by 

®n  =  • 

If  the  empirical  PDF  is  replaced  by  an  arbitrary  PDF,  the  normal  equation 
defines  a  general  functional  relationship,  8(n),  between  8  and  ii :  we  call  0(n) 
a  statistical  functional. 

We  define  M-type  estimators  generated  by  a  score  function  ^>(s;8)  by  the 
equation 

n 

0  =  n'1  l  ;  8)  =  £  <l>(s ;  8)nn(s) . 
i=l 

We  see  that  4>(s ;0)s£(s;8)  generates  the  MLE.  We  add  this  generality  because  our 
methods  of  proof  in  the  next  section  are  really  about  M-type  estimators  with  the 
MLE  results  following  as  a  special  case.  Note  that  the  notion  of  a  statistical 
function  applies  to  M-type  estimators  also.  More  definitions  that  we  need 


We  define 


m(0,n)  =  £  </'(s;6)n(s) 

for  an  arbitrary  PDF  and  score  function.  The  derivative  of  m(0,n)  with  respect 
to  0  is  m'(0,T|).  Note  that  m'(6,ng)  means  m'(8,ri)  evaluated  with  n=T)g-  The 
normal  equation  is  O=m(0,nn)  and  Fisher's  information  is  1(0)  =  -m'(0,t)0)  with 
4>=i.  The  Newton-Rapheson  algorithm  for  solving  the  normal  equation  is 

0t+l  =  0fc  +  m(0t,nn)/-m’(0t,iin); 
if  <I>=Z  the  Fisher  scoring  algorithm  is 

0t+l  -  0*  +  mfOt.V/KOt). 

Let  ( s ; 8 )  be  a  given  score  function  and  let  0q  denote  the  vaiue  of  0  that 
solves  the  equation  0  =  m(8,ii),  corresponding  to  this  score  function.  If  the 
PDF  n  is  a  member  of  some  parametric  family  and  satisfies  Ti=T| q ^  for  a  given 
fixed  parameter  value  0^  and  if  0O=0^,  then  we  say  that  the  score  function  is 
unbiased. 

If  <l>  is  an  unbiased  score  function,  then  O=m(0,Ti0)  for  all  0.  This  fact 
leads  to  an  identity  that  is  analogous  to  the  Fisher  information  identity  pre¬ 
sented  previously  and  is  proven  in  exactly  the  same  way.  The  identity  is: 

-m '  ( 0  ,  T)  0 )  =  £  «p(s;0)£(s;0)n(s;0)  . 

If  one  replaces  m'(0,T|n)  by  its  expectation  under  rig  in  the  Newton-Rapheson 
algorithm,  one  obtains  an  algorithm  that  is  analogous  to  Fisher  scoring.  If 
is  an  unbiased  score  function,  then  one  may  use  the  above  identity  for  -m'(0,ii0) 
to  avoid  evaluating  the  derivative  of  iji. 

An  important  subclass  of  unbiased  score  functions  are  generated  by  an 
arbitrary  weight  function  w(8;x)  where 

d>(s;0)  =  w(0;x)[u-n(0;x)]n'(0;x). 


If  we  choose 


7 


w(6;x)  =  v(8;x)_1 

then  4>(s;9)  =  £(s;0)  and  we  are  back  to  the  MLE.  Other  choices  of  the  weight 
function  lead  to  resistant  estimators.  For  example,  Jones  (1982)  suggests 
w(0;x)  =  v(0;x)h-l  with  h>0,  a  tuning  constant.  We  can  stay  in  the  class  of 
exponential  families  with  arbitrary  response  variable  u,  as  long  as 
11(0  ;x)=E(u|  0  ,x)  and  w(0;x)=var (u| 0 ,x)-^  (see  Jennrick  and  Moore,  1975). 

Jones'  resistant  estimator  could  be  generated  for  these  families  also  by  letting 
w(0;x)=  var(u| 0,x)^“^ .  We  obtain  a  more  general  class  of  estimators  by  allowing 
the  weight  functions  to  depend  on  the  response:  ip(s  ;  0)=w(0  ;s)  [u-ll(0  ;x)  ]  II '  (0  ;x)  . 
Krasker  and  Welsch  (1982)  consider  these  estimators  for  the  general  linear 
model.  Stefanski,  Carroll,  and  Ruppert  (1984)  consider  these  estimators  for  the 
logistic  model. 

An  algorithm  based  on  Gauss-Newton's  algorithm  for  solving  normal  equations 
with  score  ^>(s  ;  0)=w(0  ;  s)  [  u— II  ( 0  ;x)  ]IT  (0  ;x)  is  as  follows:  (see  Holland  and  Welsch, 
1977)  define  d*  =  IT '  ( 0 ;  x^ )  and  w^  =  w(0^;  s^)  then 

t+1  t  t_  t  t  t  t  t  , 

0  =  0  +  di  Wi  di]-1  l  di  Wi  [ui-n(0t;xi)] . 

This  algorithm  is  iterative  reweighted  least  squares:  at  convergence 
0n=0°°,  so  if  we  define  the  pseudo-observation  Zi=d^8n  +  [u^  -  n(0n;x^)]  then 

K  =  [l  di  vi  di]_1  l  di  Zi. 


See  Pregibon  (1981)  and  McCullagh  and  Nelder  (1983)  for  a  similar  algorithm 
based  on  the  exponential  family  with  linear  predictors.  Note  that  this 
algorithm  is  also  identical  to  Fisher  scoring. 


3.  GENERAL  ASYMPOTOTIC  THEORY  OF  M-TYPE  ESTIMATORS 


We  present  consistency  and  asymptotic  normality  (AN)  results  in  this  sec¬ 
tion.  In  the  first  part  we  confine  attention  to  the  main  results  and  in  the 
second  part  we  supply  the  proofs.  Readers  may  skip  the  proofs  and  move  on  to 
the  next  section.  In  the  main  results  we  discuss  conditions  for  consistency  and 
AN.  We  also  characterize  an  approximation  to  the  M-type  estimator  that  is 
important  for  AN  results  and  for  the  robustness  results  in  §5. 

3 . 1  Main  Results:  Consistency 

Suppose  the  sample  sj,  S2,...,sn  is  IID  with  PDF  ti  .  The  empirical  PDF  r)n 
satisfies  J  n n—ri  |  ->0  wp  1  as  n-+~.  This  fact  gives  us  the  obvious  candidate  for 
the  limit  of  an  M-type  estimator,  0q  which  solves  O=m(0Q,Ti);  when  does  0n-*0o’  It 
is  possible  that  the  equation  O=m(0,rin)  yielding  0n  has  more  than  one  solution, 
in  which  case  a  consistency  result  may  be  about  only  one  of  the  possible  sequen¬ 
ces  of  M-type  estimators.  Also,  it  is  possible  that  the  equation  O=m(0o,n)  does 
not  have  a  local  solution,  in  which  case  a  consistency  result  would  not  be  use¬ 
ful.  Some  known  results  follow. 

Huber  (1964):  Let  0q  be  the  unique  solution.  If  tf>(s;0)  is  monotone  in  0 
for  each  stS  then  every  sequence  0n->0g  wpl . 

Boos  (1977):  Let  0q  be  an  isolated  solution.  If  tf>(s,0)  is  continuous  in  0 
for  each  seS  then  there  exists  a  sequence  0n->0Q  wpl. 

Huber  (1967,  1980):  Let  0q  be  an  unique  solution.  Let  |m(0,T|)|  be  bounded 
from  zero  as  1 0 1  -»« .  If  i/»(s;8)  is  continuous  in  0  for  each  seS,  then  every 
sequence  0n-*0o  wPl  • 

The  various  conditions  for  consistency  will  be  satisfied  when  we  impose 
stricter  conditions  for  AN. 

Example.  For  the  2PL  model  11(0  ;x)  is  strictly  monotone  increasing  and  hence 


♦(s;0)  =  a(x)  [ u— II ( G  ;x)  ]  is  monotone,  the  solution  8q  to  0=m(8o,n)= 

I  a(x)  [n*(x)-II(8o;x)  ]p(x)  is  unique  with  II*(x)  arbitrary.  Thus  for  n 
arbitrary,  Huber  (1964)  applies.  If  H=n0^  for  some  fixed  8^,  then 
00=8 i  is  the  unique  solution. 

3 . 2  Main  Results:  Asymptotic  Normality 

The  first  order  asymptotic  properties  of  M-type  estimators  are  charac¬ 
terized  by  the  influence  curve.  The  influence  curve  (IC)  is  defined  as:  let 
seS  and  6S  a  point  mass  at  s,  for  £>0 

ic(s,n,*)  =  lim  e Cn+€ C6s-Ti) )-0(n)  . 
e-+0  e 

Denoting  8 ( 6S ; c )  =  8(ti+€ (6s~n) ) ,  we  see  that  the  influence  curve  is  an  ordinary 
derivative  of  0(6S;€)  evaluated  at  0:  IC(s,ii,<f>)  =  (d8(6s ; e ) /dc ) q .  For  M-type 
estimators,  it  will  be  proved  later  that  for  £  an  arbitrary  PDF,  £  IC(s,ti  , <f>)£(s) 
-  (d8(£;e)/d€)o-  This  latter  characterization  allows  us  in  §5  to  make  an  impor¬ 
tant  connection  between  the  bias  and  the  influence  curve  of  MLE's.  Also  for  M- 
type  estimators  the  influence  curve  is: 

(**)  IC(s,T],*)  =  *(s;e0)/-m,(8o,Ti) 

where  throughout  the  remainder  of  this  section  8q  satisfies  O=m(00,Ti)  and 
m,(0O>I<)<0- 

Example.  For  the  2PL  and  i)  arbitrary 

IC(s,n,*)  =  a(x) [u-n(80;x)]/£  a(x) 2  v(80;x)p(x). 

Normally  the  denominator  would  depend  explictly  on  II* (x)  but  does  not, 

since  g"(8;x)=0.  However,  it  does  depend  on  II* ( x)  through  0q  which 

solves  0  =  £  a(x)  [n*(x)-ll(x;  0q)  ]p(x) . 

The  primary  application  we  make  of  the  influence  curve  in  this  section  is 

to  get  a  leading  term  approximation: 

n 

(*)  en_eo  =  n_1  l  iC(Si,n,*)+Rn 


where  Rn  is  the  remainder  term.  We  show  below  n*  Rn  -  0  in  probability,  thus 
the  behavior  of  n£(0n-00)  is  deduced  from  the  approximation  and  the  Lindeberg- 
Levy  central  limit  theorem.  The  sufficient  conditions  for  AN  of  M-type 
estimators  are  listed  under  (C) .  They  are  implied  by  conditions  (D)  when  the 
M-type  estimator  is  an  MLE: 

(C)  There  exists  an  open  interval  Qq  and  a  constant  c>0  such  that  for  all  0  in 

flo 

C— 1 :  </>(s;0);  ( s;0);  ip''(s;0)  exist  for  all  seS,  with  the  first  two  continuous 

in  0 ; 

C-2:  m'(0£,rie)  <  0  for  all  0  <  e  £  1  and  |i)-£|£e  where  0£  solves  O=m(0,n£) 
and  ti£  =  ti  +  e  (£~n)  . 

(D)  Define  Qq  =  {0:  v(0;x)>O  for  all  xeX}  then  suppose  Aq  is  not  empty  and  for 
all  0efio 

D-l:  H'(0;x);  II ’ ' ( 0 ; x) ;  n'''(0;x)  exist  for  all  x,  with  the  first  two  continuous 
in  0 ; 

D-2:  £  g"  (0  ;x)  [lT'f(x)-II(0  ;x)  ]p(x)  <  £  g'(0;x)?  v(0;x)p(x)  for  II’V(x)  in  an  open 
interval  for  each  x. 

Theorem  1.  Assume  (C)  then  n^(0n~00)  is  AN  mean  0  and  variance: 
o02  =  £  IC(s,n,!/0?  H(s)  . 

Corollary.  Assume  (C-2)  and  Ti=n 0 ^ »  0j  fixed.  If  if/  is  unbiased,  i.e.  00=0j, 
n2[0n-0o]  is  AN  with  mean  0  and  variance: 

°0  =  I  <Hs;0o)?n(s;0o)/[£  <Ks;0o)£(s;0o)ti(s;0o)]?  where  £(s;0)  = 
(3/30)  log  n(s;9).  Hence  an  unbiased  M-type  estimator  is  efficient 
if  and  only  if  < fi  is  proportional  to  £,  or  in  other  words  the  MLE  is 
optimal  among  M-type  estimators  with  unbiased  score  functions. 
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3.3  Proof  of  Asymptotic  Normality 
Define  for  any  £  PDF, 

n€  =  n  ♦  £(C~n)  and 
0 £  =  0 (^£ ) ■ 

A  second  order  Taylor  series  expansion  about  0  is 
(***)  0£  =  0O  +  (d8/ de ) 0£  +  i  (d28/de2)a  a2, 

0  <  a  <  £.  Under  conditions  (C)  we  show  that  this  expansion  is  valid  with  e=l. 
Using  the  expansion  with  £=Tin,  we  obtain  the  approximation  (*) ,  where  Rn  is 
expressed  in  terms  of  the  second  derivative  of  0£ .  To  show  that  the  1C  has  the 
form  (**)  and  that  ni  Rn  -+  0  in  probability  we  derive  the  first  two  derivatives 
as  follows. 

Define  M(0,e)  =  m(0,n£).  Because  0£  satisfies  M(8£,e)  =  0  for  all  OieSl, 
the  first  and  second  total  differentials  with  respect  to  e  are  identically  zero 
and  yield  two  simultaneous  equations  involving  dJ0/deJ  j=l,2: 

(1)  (3M/30)ee  d0/de  +  3M/3e  =  0 

(2)  ( 3M/ 30 ) ee  d20/d€2  +  [(32M/302)0£  (d0/de)  +  (32M/383e )0£ ]d0/de 

+  (32M/3c30)e£  d0/d€  +  32M/3t2  =  0. 

Solving  equations  (1)  and  (2)  for  dJO/deJ,  j=l,2,  and  using  3M2/3c2  =  0,  we 
have  from  equation  (1): 


and  from  equations  (l')  and  (2): 


Now  we  obtain  expression  (**)  for  the  IC:  Let  0q  =  8(g),  we  have 
3M/3e  =  (3/36)  m(0o,n£)  =  (3/3e)  £  *(s ;0O) [u(s)  +  e(£(s)  -  n ( s ) ) ] 

=  Y  ^(s;0q)  (£(s)  -  ti ( s ) ]  =  Y  ^(s ;0q)  £(s).  Substituting  (3M/30)eo  =  m'(0o,n) 


and  the  last  expression  into  (l‘)  we  have  for  any  PDF  £,  (d8/de)o  =  £  <f>(s;00) 
£(s)/  -m'(8o,Ti).  Thus  for  £  =  6S,  we  have  IC (s ,<#»)  =  ^(s;0q)/  -m'(0Q,Ti). 

With  £=iin,  the  empirical  PDF,  the  MLE  0n  =  0 (ti ^ )  where  n ^  is  T)e  with  €=1. 
Using  the  expansion  (***)  with  e=l  we  have 


0n  -  0O  =  n  1  £  ICCsi.n,*)  +  Rn 

i=l 


where 

Rn  =  »  f°r  some  0<a*<l. 

Now  we  state  the  conditions  for  the  expansion  (***)  and  hence  the 
expression  for  Rn  to  be  valid:  (see  Serfling,  1980,  pp.  43,  215). 

(A)  Apostle  (1957,  pg.  96)  (d8/de)+,  the  righthand  derivative,  and 
d20/de2  exist  everywhere  in  the  open  interval  (0,1);  with  the  first 
continuous  in  the  half-closed  interval  [0,1). 

By  expression  (l')  and  (2')  for  d0/de  and  d20/dc2  we  have  formulated  con¬ 
ditions  (B)  that  satisfy  conditions  (A): 

(B)  There  exists  an  open  interval  S2q  such  that  for  all  0  in  Qq 
B-l:  m(8,n),  m'(0,n),  m''(0,i))  exist  for  all  n ; 

B-2:  there  exists  a  constant  c,  such  that  for  all  £:  |£-ti|£c, 
m'(0£,T|€)<O  for  all  0£e£1. 

To  further  obtain  n^Rj^O,  we  need  to  examine  the  terms  in  expression  (21) 
and  place  appropriate  conditions  on  the  score  function,  f.  The  four  terms  are 

OM/30)e£  =  m'  (0€  ,ne) 

(3JM/382)ee  =  m"(0e,u) 

(3M/3e)  =  m(0e,£-n) 

(3M/3£38)0£  =  m'(0€,£-n). 


>  ■> 
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y.  w-.,wv”Y*v^ ; »  i." \  >. i.t'.'v • 


r  /.  i  '  i 


13 


These  terms  apply  to  Rn  with  £=Hn. 

We  see  that  the  behavior  of  Rn  depends  directly  on  that  of  Tin  and  in  par¬ 
ticular  the  differences  nn(s)  -  n(s),  seS.  We  have  given  condition  (C)  at  the 
beginning  of  this  section  to  keep  m'(0£,ri£)  properly  away  from  zero  so  as  to 
keep  Rn  from  exploding  and  to  infer  its  behavior  from  that  of  i)n.  The  following 
two  lemmas  imply  that  n^  Rn-»0  WP* • 

Lemma  A.  Assume  conditions  (C) .  Put  £=nn  in  (21).  There  exists  constants  a 
and  np  such  that  |  d 2 0 / de  2 1  <  a|rin-n|2  for  all  0<£<1  and  n>ng  wpl. 

Lemma  B.  Assume  s^,S2.---»sn  are  I  ID  with  PDF  ii .  Then  as  n-»-  the  following 
holds : 

a)  T|n(s)-*Tt(s)  for  all  seS  wpl; 

b)  {n^[nn(s)  -  n(s)]:  seS}  converges  in  law  to  a  Gaussian  process  with 
mean  0  and  covariance  function: 


IT|(s)[l-n(s)]  s=t 

-n(s)n(t)  s*t  ; 

c)  hn_T<l">0  WP1  • 

d)  ni|  nn~ri|  converges  in  law  and  in  probability. 

These  two  lemmas  imply  that  n^  Rn-»0  in  probability  and  hence  n^(0n-0Q)  is  AN 
since  lemma  A  implies  n2|Rn|<a  n^  |  Hn“Tl  |  •  |  nn-n  |  and  lemma  B  implies  that  |  T>n~Tl  |  "*0 
while  n*|iln-T)|  remains  bounded  in  probability. 


m 

to  v  V 


4.  SPECIFIC  ASYMPTOTIC  BEHAVIOR  OF  MLE’s 

Throughout  this  section  no  will  denote  a  true  PDF  and  0q  the  solution  to 


O=m(0,no).  The  asymptotic  behavior  of  the  MLE  is  obtained  from  the  previous 
results  of  M-type  estimators  with  score  function  £(s;0),  seS.  We  discuss  these 


aspects:  goodness-of-fit,  scale  reproduction,  and  Fisher  variance. 

Most  practical  operational  models  satisfy  condition  (D-l) .  Thus,  we  can 
say  that  for  situations  of  interest,  MLE's  are  consistent,  0n->0()  as  n-*»  and  is 
asymptotically  normal  even  when  ijq  is  not  a  member  of  the  parametric  family 
{ ij g : 0  real}  generated  by  the  set  of  operational  models  but  satisfies  the  mild 
regularity  condition  (D-2) .  We  will  employ  this  result  with  the  1PL,  2PL,  and 
3PL  item  response  models  in  the  example  at  the  end  of  this  section. 

Let  rig  denote  a  member  of  a  certain  parametric  family  and  consider  the  MLE 
associated  with  the  family.  If  for  some  0j,  H o=Tl 8 1 *  th®11  the  MLE  is  asymp¬ 
totically  unbiased,  meaning  0q=6i.  Let  denote  a  member  of  a  different  para¬ 
metric  family.  If  Ho=£0^  then  the  MLE  is  asymptotically  biased  meaning  00*®! • 

If  HQ  is  not  a  member  of  any  parametric  family  then  the  notion  of  unbiasedness 
has  no  meaning. 

Even  if  hq  is  a  member  of  some  parametric  family  not  identical  to  {1)9} , 
the  bias  does  hold  much  information  about  how  good  the  MLE  may  be .  This  is 
because  the  parametrization  of  the  family  containing  Hq  is  as  good  as  arbitrary 
when  it  is  not  exactly  the  one  generating  the  MLE.  This  leads  us  to  propose  a 
different  notion  of  accuracy,  possibly  supplying  the  information  we  usually 
obtain  with  measurements  of  bias. 

The  information  supplied  by  the  bias  is  obtained  from  comparison  of  its 
square  to  the  variance,  because  the  mean  square  error,  a  measure  of  total  error, 
is  the  sum  of  the  squared  bias  and  variance.  When  the  bias  overwhelms  the 
variance,  one  usually  goes  looking  for  another  statistical  procedure  that  can 
control  the  bias.  (What  this  compares  to  in  IRT  is  the  adoption  of  more  complex 
item  response  models) . 


A  measure  that  seems  to  decompose  into  parts  due  to  "bias"  and  "variance" 
and  does  not  depend  on  the  arbitrary  parametrization  of  a  true  family  of  PDF's 


r.AV 


is  as  follows:  for  two  PDF's  n  and  £  define 

K(u,5)  =  I  n(s)  log  n(s)/?(s). 

K(ti,£)  is  nonnegative  and  equal  to  0  when  ti=£;  K'(no,»l0)  =  -tn(8,T|Q),  thus 
K(tio>t)0)  is  minimized  by  8g.  A  second  order  Taylor  series  expansion  gives 
E  K(no,n0n)  s  K(Ti0,ne0)  -  |  m’(80,Tt0)n-1o*, 

where  we  have  assumed  that  the  MLE  has  been  modified  appropriately  to  have  the 
moments  for  the  approximation  and  we  have  used  Theorem  1. 

Thus  E  K ( ti o » T| 0n)  behaves  as  a  "total  error"  and  K(hq,i10q)  behaves  as  "bias 
squared"  when  it  is  compared  to  the  last  term  on  the  right-hand  side  of  the  above 
Taylor  series.  The  quantity  K(nn,il0n)  is  proportional  to  the  deviance  in 
generalized  linear  models  (see  McCullagh  and  Nelder,  1983)  and  serves  as  a 
goodness-of-f it  statistic.  Values  of  E  K(tiQ,ti0n)  and  its  approximate 
components  are  displayed  in  Table  3  and  discussed  in  the  example  at  the  the  end 
of  this  section. 

Information  of  a  different  nature  than  bias,  applicable  to  arbitrarily 
parametrized  families,  is  obtained  by  comparing  the  rank-order  of  estimated  para¬ 
meters  with  the  rank-order  of  known  abilities.  Suppose  there  is  a  certain  para¬ 
metric  family  {£0}  of  PDF's  with  the  property  that  no  €  {$8}  where  n 0  may  be 
generated  by  any  member  of  a  population  of  examinees.  But  the  family  (Co)  is 
too  complex,  making  its  calibration  unstable  with  reasonable  sample  sizes  of 
examinees.  Thus,  we  prefer  instead  to  use  a  more  parsimonious  family  { n 0 }  with 
the  MLE  obtaining  8q  as  a  limit  when  n o=^ 0  »  ®1  fixed,  and  n->».  Previous 
discussion  implies  that  80*81,  in  general;  but  the  bias  here  is  nonsense.  What 


is  useful  is  a  measure  of  the  distortion  between  0g  and  0}  as  0}  moves 
throughout  the  population.  We  do  not  propose  a  measure  but  we  do  think  that  one 
should  be  sensitive  to  reversals  in  the  "0-scale'1 .  Table  3  displays  reversals 
for  the  1PL  and  2PL  families  and  is  further  discussed  in  the  example  at  the  end 
of  this  section. 

Turning  from  bias  to  variance,  we  will  now  consider  the  predicament  of 
approximating  the  true  asymptotic  variance  of  an  MLE  when  we  do  not  know  t|q.  A 
ready  approximation  to  a*  of  Theorem  1  in  §3  is  the  reciprocal  of  Fisher's 
information:  I(0O)-1  (§2).  But  we  know  from  §3  that  it  is  not  valid  when  T)q  is 
not  a  member  of  the  parametric  family  of  PDFs  that  generate  the  MLE. 

In  general,  the  o*  does  not  majorize  l(0g)~l  or  viceversa.  Thus,  it  is 
possible  that  the  reciprocal  of  Fisher's  information  can  give  either  a  conser¬ 
vative  or  misleading  approximation  to  the  true  asymptotic  variance.  Table  3 
displays  the  true  variance  along  side  the  Fisher  variance  to  show  both  the  good 
and  the  bad;  we  discuss  this  further  in  the  example. 

Example.  Listed  in  Table  1  are  the  true  and  modeled  response  probabilities  of 
five  subjects  on  four  ASVAB  items.  The  true  probabilities  were  actually 
obtained  from  a  very  complex  item  response  model  which  was  calibrated  on  a  very 
large  population.  The  subjects  are  ranked  from  lowest  to  highest  going  left  to 
right.  The  modeled  response  probabilities  follow  the  1PL,  2PL,  or  3PL  item 
response  models  as  indicated;  they  too  were  calibrated  on  a  very  large  popula¬ 
tion;  Table  2  lists  the  values  of  the  calibrated  parameters.  Table  3  is  a  sum¬ 
mary  of  the  asymptotic  features  of  the  respective  MLE's,  which  we  discuss  as 
fol lows . 

First,  note  the  magnitudes  of  the  traditional  notion  of  bias  by  taking 
| 6 i — 0q I  differences  from  columns  (1)  and  (2).  These  values  could  easily  change 


upon  reparametrization  of  the  true  response  model;  thus  they  are  arbitrary. 

Thus  column  (1)  should  only  convey  the  rank-order  of  the  subjects. 

Our  notion  of  "bias  squared"  is  found  in  column  (5),  K(no>6())-  These 
values  will  not  change  if  another  parametrization  were  imposed  on  the  true 
model.  The  worst  fit  is  found  with  subject  5(2PL),  referring  back  to  Table  1  we 
can  see  that  the  2PL  model  provides  poor  estimates  of  all  item  response  probabi¬ 
lities.  There  are  three  good  values;  for  example,  subject  5(3PL)  for  which 
Table  1  shows  good  estimates  of  item  response  probabilities. 

Column  (3),  -m'(00,iio)>  gives  us  a  feel  for  the  curvature  of  the  likelihood 
since  n-m' (0q  ,tiq)  is  an  estimate  of  n-m' (0n,Tin)  ,  the  second  derivative  of  the 
log-likelihood.  We  see  that  the  likelihood  would  tend  to  be  flat  for  subjects 
1,2  (3PL)  even  though  K(no>8o)  shows  close  agreement  between  the  estimated  and 
true  item  response  probabilities. 

We  present  the  "total  error"  E  K(H0,T)8n),  column  (4),  for  a  sample  size  of 
n=16,  each  item  type  represented  equally.  These  errors  appear  to  be  equal 
across  models  and  subjects  with  exception  of  subject  5  (2PL)  as  noted  before. 

The  components  of  the  total  error  are  in  columns  (5)  and  (6)  which  can  tell  us 
the  proportion  of  the  total  error  due  to  systematic  bias:  (5)/(4).  The  worst 
proportion  is  found  with  subjects  1,  2  (1PL)  meaning  that  the  1PL  is  inadequate 
with  these  subjects. 

We  may  average  the  "total  error"  and  "bias  squared,"  E  K(HQ,Ti0n)  and 
K(^O»t'0o)  respectively,  over  the  subjects  to  get  an  overall  assessment.  These 
averages  are  for  the  1PL,  2PL,  and  3PL  models  respectively:  (error,  bias2)  = 
(.06, .03),  (.06, .03),  (.04, .01).  We  see  that  on  average  there  is  at  least  25 
percent  of  total  error  that  is  systematic  bias. 


The  presence  of  reversals  of  the  fig-scale  can  be  detected  from  column  (2). 
Both  the  1PL  and  2PL  item  response  models  have  reversals  at  the  lower  abilities. 
This  happens  because  the  1PL  and  2PL  calibrations  compensate  good  fit  to  true 
response  probabilities  by  distorting  the  fig-scale.  The  Spearman  rank  correla¬ 
tion  between  the  true  ability  rank-order  and  the  0g-scale  rank-order  of  the  1PL, 
2PL,  and  3PL  models  are  respectively:  0.60,  0.67,  1.00.  The  numbers  may  be 
interpreted  as  an  alternative  theoretical  goodness-of-f it ,  since  we  never  would 
know  the  true  rank-order  of  ability,  there  is  no  practical  gain  in  the  measure. 

We  compare  the  true  variance  and  the  Fisher  variance  by  using  columns  (7) 
and  (8).  For  the  most  part,  the  Fisher  variance  yields  a  conservative 
assessment  of  precision;  however,  it  can  also  be  misleading  as  with  subjects  3, 

5  (2PL) . 

Remarks.  1)  Column  (3),  -m'(0g,ng),  can  play  the  role  of  information.  An 

-  *  * 

empirical  assessment  could  be  -m  (fin*1^)-  Also,  ratios  could  play  the 
role  of  relative  efficiency. 

2)  One  should  be  cautious  even  if  measures  of  fit,  such  as  K(nn,B0n), 
are  favorable  because  as  the  example  shows  it  is  possible  to  have 
reversals  of  the  fig-scale  even  if  the  fit  is  good. 

3)  We  have  refrained  from  making  an  elaborate  comparison  of  the  1PL, 
2PL,  and  3PL  models  based  on  the  data,  because  one  needs  to  properly 
account  for  sampling  variability  of  the  calibration  process.  Such  a 
study  is  reported  in  Jones,  Wainer  and  Kaplan  (1984). 

5.  SPECIFIC  ROBUSTNESS  OF  THE  MLE 

Let  n  denote  an  arbitrary  true  PDF,  T)g  some  fixed  PDF,  {ng}  a  parametric 
family  of  PDF's  that  induces  the  MLE.  Let  6(ti)  denote  the  solution  to  0=m(fi,T))- 


From  Theorem  1  in  §3  we  have  that  0n-*0(n)  with  asymptotic  variance  as=a*(T|). 

Note  that  we  will  not  assume  that  T)  or  no  belongs  to  { n 0 }  - 

The  asymptotic  bias  of  the  MLE  relative  to  no  and  {^0}  is  defined  as  |0(n) 

-  0 C n 0 ) | -  Let  F £  denote  an  {-neighborhood  of  no>  for  9  belonging  to  P£  we  want 
to  quantify  the  degradation  of  bias  and  variance.  We  say  that  the  robustness  of 
the  MLE  is  measured  by  the  amount  of  degradation  of  the  maximum  bias 

b(e )  =  sup  | 0 (n)  -  B(no) | 
n«P£ 

and  the  maximum  variance 

v(e )  =  sup  a*(n)  • 
neP£ 

If  b(e)  were  large  relative  to  v(€),  then  the  maximum  variance  would  not  be 
a  very  important  quantifier  of  robustness.  We  confine  study  to  b(t)  in  this 
paper . 

There  are  several  important  notions  for  quantifying  the  robustness  of  an 
estimator.  Among  them  are  the  sensitivities  of  a  parameteric  estimator,  a 
fitted  value,  or  a  predicted  value  when  one  observation  is  deleted  from  the 
sample ■  These  measures  are  called,  respectively,  gross  error  sensitivity 
(Huber,  1981),  change  in  fit  sensitivity  and  prediction  sensitivity  (Krasker  and 
Welsch,  1983).  The  gross  error  sensitivity  is  related  directly  to  the  maximum 
bias  as  shown  below.  We  formulate  these  quantities  and  demonstrate  their  use 
with  the  1PL,  2PL,  and  3PL  item  response  models. 

Another  robustness  notion  is  the  sensitivity  of  the  maximum  bias  as  e  is 
varied.  Certain  values  of  €  can  cause  the  maximum  bias  to  explode;  the  smallest 
such  value  is  called  the  breakdown  point  (Huber,  1981).  We  formulate  this  quan¬ 
tity  and  demonstrate  its  use  with  the  1PL,  2PL,  and  3PL  models  also. 


5.1  Sensitivities  Based  on  Deletion 

The  gross  error  sensitivity  is  defined  as 

T*  =  max|  ICCs.Uo.'JO  |  • 
s 

From  the  leading  term  approximation  of  section  3.2,  we  see  that  it  is  propor¬ 
tional  to  the  maximal  influence  exerted  by  anyone  observation  on  the  error  of 
estimation,  0n  -  8(hq).  It  is  related  to  the  maximum  bias  b(e)  as  follows. 
Recall  from  §3.2  that  [8(hq+€  (£-Jlo))_0(lo)  ]  £" *  £lC(s  ,iiq,#)£(s)  as  e-»0.  Let  P€ 
be  the  e -contamination  neighborhood  defined  by  P£  =  {n:  ti=ti q  +  t(£~ho)>  5 
arbitrary  PDF} .  Then 

sup|e(n)-e(ri0)  |  =  e  sup|£  IC(s ,n0 ,*>)£ (s)  | . 
nePe  £ 

Thus 

b(e)  s  €  T*. 

ft 

So  that  for  small  e,  T  measures  the  rate  of  growth  of  the  maximum  bias  over  the 
€ -contaminated  neighborhood. 

ft 

For  M-type  estimators  7  =  «•  is  equivalent  to  a  zero  breakdown  point, 

meaning  that  any  departure  from  no  will  cause  the  maximum  bias  to  explode. 

Either  condition  also  implies  that  the  estimator  is  not  continuous  at  no  when 
reviewed  as  a  function  of  n  (assuming,  of  course,  a  complimentary  topology  on 
the  set  of  PDF's).  An  estimator  is  qualitatively  robust  if  it  is  continuous 
(Huber,  1981),  thus  an  M-type  estimator  is  not  robust  if  T,v  =  •»  or  the  breakdown 
point  is  zero. 

The  gross  error  sensitivity  also  measures  the  maximum  change  in  the  estima¬ 
tor  caused  by  deleting  one  observation.  Let  nn(l)  and  9n(0)  denote  the 
empirical  PDF  with  and  without  s^.  Let  8n(l)  and  8n(0)  denote  the  corresponding 
MLE's.  Then  using  the  direct  definition  of  the  influence  curve  (§3.2)  with  s=s^ 


H=Tln(0)  and  e=l/n  it  is  easy  to  show 

6n(l)  "  en(°)  5  n_1  IC(Si,Tin(0),*). 

Thus 

max  |en(l)  -  en(0)|  £  n"l  T*. 
i 

The  change  in  fit  sensitivity  concerns  the  effect  of  deleting  one  observa¬ 
tion,  s^,  on  the  estimated  logit,  g(0n;x£).  This  change  in  fit  is  g(0n(l);x^) 

“  g(en(°);xi)  £  g'(0n(O);xi)[0n(l)  -  0n(O) ] •  Putting  this  together  with  the 
estimator  sensitivity  we  have 

»xi)“g(®n(°)  ;xi)Sn-1g'  (6n(°)  ixi)ICCsi,TinC°)  ,< P)  ■ 

Thus  the  shape  of  g '  (0  ;x)  IC(s  ,ri  ,*l>)  would  indicate  robustness  as  would  the  size 
of  the  change  in  fit  sensitivity: 

T**  =  max | g 1  (0;x)IC(s,T),i<>)|  . 
s 

Prediction  sensitivity  concerns  the  effect  of  deleting  an  observation  from 
the  sample  on  the  predicted  logit  of  me  future  item,  g(0n;z)  where  z  is  yet  to 
be  administered.  Let  X=g'(0;z),  then  by  a  Taylor  series  approximation, 


g(0n;z)£g(0;z)+X(0n-0) .  Hence  the  change  in  prediction  is  measured  by  the 
change  in  X0n,  and  XIC(s£ , n ,♦)  measures  this  change  due  to  deleting  s^.  To  be 
meaningful  this  change  must  be  weighed  relative  to  its  standard  deviation, 

X[£  IC(s  ,ri  ,4>) 2  i|(s)]a.  Thus  the  shape  of  the  ratio  indicates  robustness  as 
would  the  prediction  sensitivity: 


We  can  simplify  this  quantity  to  show  the  direct  dependence  on  the  score  func¬ 
tion  by  using  the  formula  for  the  influence  curve: 


s  [I  '/'Cs ; 0) 2  n(s)]5 


where  0  is  evaluated  at  0(n)  • 


Now  we  study  the  various  sensitivities  to  get  a  feel  for  their  implications 
in  IRT  using  the  1PL,  2PL,  and  3PL  models  as  examples.  Graphs  of  these  quan¬ 
tities  are  useful  but  require  specific  values  for  item  parameters  and  do  not 
lead  to  any  more  profound  conclusions  then  just  analytic  circumspection.  Graphs 
are  most  useful,  however,  with  actual  data,  providing  diagnostic  information  on 
the  fit  of  the  model.  We  study  only  the  MLE  induced  by  {rig]  and  do  not  look  at 
general  M-type  estimators.  We  also  restrict  this  study  to  sensitivities  to 
departures  from  the  parametric  model,  that  is  we  let  T)q=T)q  for  some  value  of  0. 
Huber  (1980)  remarks  that  a  better  indication  of  robustness  is  to  allow  T)  to 
roam  around  a  Pe  neighborhood  of  tiq  while  looking  at  the  sensitivities.  We  do 
not  have  the  analytical  means  to  do  this  at  this  time. 

Consider  now  and  for  the  rest  of  this  section  the  MLE  with  operational 
models  {II(0;x)  :xeX}  .  With  ti0=ti  e  ,  — m '  ( 6  ;  q  q  )  =  £  <Ks;0)  fc(s;0)  n  ( s  ;  0 )  and  with 
<K s  ;  0)  =  £ ( s  ;  0 )  =  g '  ( 0  ;x)  [ u— II ( 0  ;x)  ]  ,  we  have 


lC(s,n,£)  = 


g'  ( 0  ;  x)  [  u-II  ( 0  ;  x)  ] 

I  g' (0;x)2  v( 0 ;x)  p(x) 


Def  irie 


M(  0  ;x)  =  max{FI(0;x),  1-Jl(0;x)}, 
the  various  sensitivies  to  departures  from  q o=Tl 0  are 

max  g'(0;x)  M(0;x) 

y*V  _  X _ 

Y  s' C 0 ; x) 2  v( 8 ; x)  p ( x ) 


max  g' (0  ;x) ?  M(8;x) 

- 1 **  =  _ - _ _  and 

Y  g' (0;x)?  v( 0 ;x)  p(x) 
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max  g' (0;x)  M(0;x) 

r  =  — - -  • 

[l  g'(0;x)z  v(0;x)  P(x)]i 

Example  1.  The  2PL  item  response  models  have  g'(0;x)  =  a(x)  where  »>a(x)>0.  If 

a(x)  =  ag,  then  the  models  are  called  the  1PL  item  response  models.  T*  is 

finite  provided  max  a(x)  is  finite.  If  the  generic  item  pool  X  is  finite  then 
T*  is  always  finite.  If  the  generic  item  pool  is  not  finite,  it  is  possible 
that  sup  a(x)=«  but  practical  reasons  would  disallow  this  from  happening  because 
an  "infinitely  discriminating  item"  is  rare. 

Example  2.  The  3PL  item  response  models  are  defined  as  11(0 ;x)  =  [l-c(x)]  R(0;x) 
+  c(x)  where  0<c(x)<l  and  R(0;x)  is  a  2PL  model.  Define  vj(0;x) 

=  R(0;x) [l-R(0 ;x) ] .  It  can  be  shown  that  g'(0;x)  =  [l-c(x)]a(x)  vj ( 0 ;x) /v(0 ;x) . 
1  is  finite  provided  max  a(x)  is  finite,  the  discussion  in  the  previous  example 
applies  here  too. 

Example  3.  For  all  the  1PL,  2PL,  and  3PL  models,  because  of  the  behavior  of 

g'(0;x),  T*,c  and  1  are  finite  if  and  only  if  max  a(x)  is  finite.  Y,V,f  and  T,  but 

it 

not  T  ,  are  invariant  for  changes  of  scale  in  a(x)  and  b(x) .  Presumably  c(x)  is 
scale  free  as  it  is  a  probability  of  the  examinee  guessing  the  correct  answer  to 
item  x. 

The  examples  lead  to  the  general  conclusion  that  T,  V' ,  and  T**  are  finite 
if  and  only  if  max|g'(0;x)|  is  finite.  For  the  1PL,  2PL,  and  3PL  models  this 
condition  is  equivalent  to  having  max  a(x)  finite. 

Because  the  sensitivities  change  as  0  changes,  their  variation  over  the 
entire  range  of  practical  0-values  should  be  studied  to  properly  assess  robust¬ 
ness  in  IRT.  This  allows  for  the  fact  that  the  MLE  procedure  must  estimate 
unique  0  parameters  for  different  subjects.  This  is  in  marked  contrast  with 
estimation  in  logistic  regression  —  the  same  estimation  procedures  as  IRT  but 


the  object  is  to  estimate  a  single  0  (such  as  lethal  dose  50  or  the  vector  f 
parameters  in  one  response  function) .  Because  the  sensitivities  must  be  viewed 
globally,  procedures  that  are  robust  for  logistic  regression  may  not  be  directly 
transferable  to  IRT. 

Consider  what  happens  as  |0|-»».  The  denominator  of  Y*  and  Y**  is  Fisher's 
information;  for  Y,  it  is  just  the  square  root.  For  extreme  0's  it  is  reason¬ 
able  to  assume  that  any  finite  set  of  generic  items  X,  item  responses  hold 
little  information  about  8;  thus  it  is  probable  that  the  denominators  of  the 
sensitivities  approach  zero  as  1 0  J  -*■«> .  Unless  the  numerators  approach  zero  at 
the  same  or  faster  rate  as  the  denominators,  the  sensitivities  will  explode. 
Applying  this  idea  to  each  sensitivity,  we  conclude  that  Y,v  always  explodes  and 
for  models  with  v(9;x)-*0,  Y,r  and  Y*-t  both  explode.  Of  the  models  considered 
before,  the  3PL  is  the  only  one  having  v(8;x)>0  as  thus  Y*  and  Y**  are 

bounded  for  the  negative  extremes  of  ability. 

These  results  imply  that  the  MLE  procedures  are  not  robust  because  the 

W  'it 

maximum  bias  in  an  c -contaminated  neighborhood  is  approximately  eY  and  Y  is 
unbounded  as  |  H  |  -*•«»> ;  thus,  the  MLE  cannot  tolerate  any  contamination  at  extreme 
0.  The  3PL  fairs  a  little  better  than  the  1PL  or  2PL  as  0-*-»  since  its  gross 
error  sensitivity  grows  a  little  slower.  Thus  to  achieve  full  protection  one 
must  look  outside  the  class  of  MLE  procedures,  which  means  we  have  to  sacrifice 
efficiency.  (Contrast  this  with  the  location  problem  where  the  median  is  the 
efficient  procedure  for  logistic  errors  and  it  is  optimal  for  minimizing  the 
maximum  bias;  Huber,  1981). 

5 . 2  Breakdown  Point 


The  worst  possible  bias  at  rig  is  defined  as  b(l)  =  sup| 0(^)— 9(no) | »  where 
the  supremum  is  over  all  arbitrary  PDFs,  £.  Let  P£  be  an  e -neighborhood  of  t|q. 
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The  breakdown  point ,  €*,  is  the  largest  e  for  which  b(e)  is  less  than  the  worst 
value : 

e*  =  sup{t  :  b(e)  <  b(l)}. 

The  value  of  e  depends  on  the  kind  of  Pe  chosen;  however,  it  is  sometimes 

adequate  to  consider  just  one  kind  of  neighborhood.  In  IRT,  b(l)=~. 

We  use  the  following  kind  of  e -neighborhood:  Let  0<II<1  and  define 

v  =  11(1-11).  Denote  the  interval  D(0;x)  =  [II  -  ev£,  II  +  cvi]  when  II  =  11(0, x) ; 

0  is  fixed.  Denote  the  subinterval  of  [0,l]  by  D*(0;x)  =  D(0 ;x)p[0 , 1  ]  .  The 

collection  of  intervals  [D*(0;x)j,  x  fixed,  is  an  e-envelope  of  the  item 

response  function  II(0;x).  Define  P£  ,  0  =  I11  :  I'(s)  =  n*(x)u  [  l-II*(x)  ] 1_u  p(x)  ; 

II,v(x)e  D*(0;x)}.  It  is  an  e -neighborhood  "centered"  at  hq. 

Define  b+(e)  =  sup[0(£)“0(no) ]  and  b_(e)  =  inf  [0(^)-0(tio)  ]  .  Then 
£  £ 
b(e)  =  max[b+(e),  -  b_(e)}.  We  consider  b+(e)  first. 

Let  ti o  =  ti 0 Q .  Define  n+(0Q;x)  =  minfl.II+evz}  with  Il=IT ( ©o i x) .  It  is  clear 

that  n+(0Q;x)  e  D(0q;x)  and  tiq0,  the  corresponding  PDF,  satisfies  m( 0 ; ii0o)>m(0 ;n) 

for  all  0  and  all  n  e  P€)eQ.  The  maximum  "positive"  bias  satisfies 

b+(e)  =  inf [ 0  :  m(0;Tl0o)  <  0}  -  0O. 

We  have  breakdown  if  b+(e)  =  b(l)  =  •.  To  avoid  this  it  is  necessary  that 
€  satisfy  lim  m(0;r)0  )  <  0.  Using  the  definition  of  m(0;n)  we  have 

0-K»  U 

m(0;t)0o)  =  Y  g'(0;x)  [n(0o;x)  -  II(0;x)]p(x)  +  £  £  g’ (0;x)v(0o;x)zp(x) . 
Letting  0-*-  and  denoting  g'(~;x)  =  lim  g'(0;x)  we  have  an  equation  for  the 
"positive"  side  breakdown: 

£+  _  Y  g'(»;x)  [  1~II( 0q ;x) )  p(x)  , 

I  g' (-;x)  v(0o;x)£  p(x) 

Similarly  the  "negative"  side  breakdown  is: 


And  the  breakdown: 


€-  =  I  s  (--;*)  n(e0;x)  p(x)  , 
l  g'(--;x)  v(0o;x)i  p(x) 


e+  =  min(£+,  £~) . 


For  a  fixed  8q,  all  MLE  procedures  for  IRT  have  a  breakdown  point  that  is 
not  0  for  Pe  neighborhoods  considered  thus  far.  But  as  | ®o I picture 
changes:  £*  =  0  if  either  H(00;x)-»1  or  0  for  all  items  x.  Thus  the  1PL  and 

2PL  induced  MLEs  have  zero  breakdown,  meaning  they  have  no  tolerance  for  depar¬ 
tures  from  their  models.  The  3PL  induced  MLE  has  zero  breakdown,  but  for  0-*— «•, 
the  "negative"  sided  breakdown  is  not  zero,  so  it  could  tolerate  some  departure 
from  its  model  there. 

Example.  The  following  displays  the  "positive"  and  "negative"  breakdown  points 
for  the  3PL  model  with  a(x)  =  aq  and  c(x)  =  cq. 
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TABLE  1.  ITEM  RESPONSE  PROBABILITIES 


Subject : 5 


True 

.89 

.77 

.88 

.95 

1PL 

.88 

.88 

.79 

.88 

2PL 

.72 

.72 

.98 

.84 

3  PL 

.91 

.73 

.85 

.96 

IC  PARAMETERS  OF  MLE' 


Rank  order  appears  in  parenthesis 


